Color space scalable video coding and decoding method and apparatus for the same

ABSTRACT

A color space scalable video coding and decoding method and an apparatus for the same are disclosed that can adjust color components or color depth according to the performance of a decoder side. The color space scalable video coding method includes generating transform coefficients by removing the temporal redundancy and spatial redundancy of input video frames, quantizing the transform coefficients, generating a bit stream by entropy coding the quantized transform coefficients, and generating a color space scalable bit stream that includes the bit stream and position information of luminance data in the bit stream.

CROSS-REFERENCE TO RELATED APPLICATIONS

This is a continuation of application Ser. No. 11/367,453 filed Mar. 6,2006, which claims benefit of Provisional Application No. 60/658,166filed Mar. 4, 2005. The entire disclosures of the prior applications,application Ser. Nos. 11/367,453 and 60/658,166 are considered part ofthe disclosure of the accompanying continuation application and arehereby incorporated by reference.

This application claims priority from Korean Patent Application No.10-2005-0036289 filed on Apr. 29, 2005 in the Korean IntellectualProperty Office, and U.S. Provisional Patent Application No. 60/658,166filed on Mar. 4, 2005 in the United States Patent and Trademark Office,the disclosures of which are incorporated herein by reference in theirentirety.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to a color space scalable video coding anddecoding method and an apparatus for the same, and more particularly, toa color space scalable video coding and decoding method and an apparatusfor the same that can adjust color components or color depth accordingto the performance of a decoder side.

2. Description of the Prior Art

With the development of information and communication technologies,multimedia communications are increasing in addition to text and voicecommunications. The existing text-centered communication systems areinsufficient to satisfy consumers' diverse desires, and thus multimediaservices that can accommodate diverse forms of information such as text,images, music, and others, are increasing. Since multimedia data islarge, mass storage media and wide bandwidths are required for storingand transmitting multimedia data. Accordingly, compression codingtechniques are required to transmit multimedia data, which includestext, images and audio data.

The basic principle of data compression is to remove data redundancy.Data can be compressed by removing spatial redundancy such as therepetition of the same color or object in images, temporal redundancysuch as little change in adjacent frames of a moving image or thecontinuous repetition of sounds in audio, and visual/perceptualredundancy, which considers human insensitivity to high frequencies. Ina general video coding method, temporal redundancy is removed bytemporal filtering based on motion compensation, and spatial redundancyis removed by a spatial transform.

In order to transmit multimedia after the redundancy has been removed,transmission media are required, the performances of which differ.Presently used transmission media have diverse transmission speeds. Forexample, an ultrahigh-speed communication network can transmit severaltens of megabits of data per second and a mobile communication networkhas a transmission speed of 384 kilobits per second. In order to supportthe transmission media in such a transmission environment and totransmit multimedia with a transmission rate suitable for thetransmission environment, a scalable video coding method is mostsuitable.

This scalable coding method makes it possible to adjust the resolution,the frame rate, the signal-to-noise ratio (SNR), and others of a videoby truncating part of a pre-compressed bit stream in accordance withenvironmental conditions such as the transmission bit rate, thetransmission error rate and system resources. With respect to suchscalable video coding, MPEG-21 (Moving Picture Experts Group-21) Part-13has already progressed its standardization work.

However, since the existing scalable video coding cannot providescalability in a color space, even a display device that requires agrayscale image instead of a color image must receive and decode thecolor image, which is unnecessary and inefficient. Further, it isinefficient for a display device that is unable to display an imagehaving a color depth of 24 bits to receive and decode a bit stream codedwith a color depth of 24 bits and then truncate unnecessary bits fromthe decoded bit stream.

SUMMARY OF THE INVENTION

Accordingly, the present invention has been made to solve theabove-mentioned problems occurring in the prior art, and an aspect ofthe present invention is to provide a color space scalable video codingand decoding method, in which an encoder can inform a decoder of theposition of luminance data in a bit stream and the decoder can transforma color image into a grayscale image as needed.

Another aspect of the present invention is to provide a color spacescalable video coding and decoding method, in which a decoder acquiresinformation on a color depth capacity from a display device, removesbits that exceeds the color depth capacity supported by the displaydevice, and decodes a bit stream.

Additional advantages, aspects and features of the invention will be setforth in part in the description which follows and in part will becomeapparent to those having ordinary skill in the art upon examination ofthe following or may be learned from practice of the invention.

In order to accomplish these aspects, there is provided a color spacescalable video coding method, according to the present invention, whichincludes the steps of generating transform coefficients by removing thetemporal redundancy and spatial redundancy of input video frames,quantizing the transform coefficients, generating a bit stream byentropy coding the quantized transform coefficients, and generating acolor space scalable bit stream that includes the bit stream andposition information of luminance data in the bit stream.

In another aspect of the present invention, there is provided a colorspace scalable video decoding method, which includes the steps ofextracting position information of luminance data from a bit stream,generating a second bit stream that includes motion data and luminancedata by truncating chrominance data from the bit stream according to theposition information of the luminance data, and restoring video framesby decoding the second bit stream.

In still another aspect of the present invention, there is provided acolor space scalable video coding method, which includes the steps ofgenerating transform coefficients by removing the temporal redundancyand the spatial redundancy of input video frames, quantizing thetransform coefficients, and generating a bit stream by entropy codingthe quantized transform coefficients, wherein the number of bits of acolor depth of the bit stream is increased in proportion to the level ofthe layer.

In still another aspect of the present invention, there is provided acolor space scalable video decoding method, which includes the stepsacquiring information on a color depth capacity from a display device,generating a second bit stream by truncating bits from an input bitstream that exceed the color depth capacity according to the informationon the color depth capacity, and restoring video frames by decoding thesecond bit stream.

In still another aspect of the present invention, there is provided acolor space scalable video encoder, which includes a temporal transformunit for removing the temporal redundancy of input video frames, aspatial transform unit for removing the spatial redundancy of the inputvideo frames, a quantization unit for quantizing the transformcoefficients generated by the temporal transform unit and the spatialtransform unit, an entropy coding unit for performing entropy coding ofthe quantized transform coefficients, and a color space scalable bitstream generation unit for generating a color space scalable bit streamthat includes a bit stream generated by the entropy coding unit, andposition information of luminance data in the bit stream.

In still another aspect of the present invention, there is provided acolor space scalable video decoder, which includes a bit streampreprocessing unit for extracting position information of luminance datafrom a bit stream, and generating a second bit stream that includesmotion data and luminance data by truncating chrominance data from thebit stream according to the position information of the luminance data,an entropy decoding unit for decoding the second bit stream, an inversequantization unit for generating transform coefficients by performing aninverse quantization on the decoded second bit stream, an inversespatial transform unit for restoring a residual signal by performing aninverse spatial transform on the transform coefficients, and a motioncompensation unit for performing motion compensation on predicted framesaccording to motion data provided by the entropy decoding unit.

In still another aspect of the present invention, there is provided acolor space scalable video decoder, which includes a bit streampreprocessing unit for acquiring information on the color depth capacityfrom a display device and generating a second bit stream by truncatingbits from an input bit stream that exceed the color depth capacityaccording to the information on the color depth capacity, an entropydecoding unit for decoding the second bit stream, an inversequantization unit for generating transform coefficients by performing aninverse quantization on the decoded second bit stream, an inversespatial transform unit for restoring a residual signal by performing aninverse spatial transform on the transform coefficients, and a motioncompensation unit for performing motion compensation on predicted framesaccording to motion data provided by the entropy decoding unit.

In still another aspect of the present invention, there is provided amethod of transferring data of a slice that contains a plurality ofmacroblocks, which includes the steps of inserting luminance data of allthe macroblocks contained in the slice, inserting chrominance data ofall the macroblocks contained in the slice, transferring a bitstreamthat includes the luminance data and the chrominance data.

In still another aspect of the present invention, there is provided amethod of generating a video sequence that includes a plurality ofslices containing a plurality of macroblocks with luminance data andchrominance data of the macroblocks, which includes the steps ofinserting the luminance data of all the macroblocks included in theslice, inserting the chrominance data of all the macroblocks included inthe slice.

In still another aspect of the present invention, there is provided amethod of processing a video sequence that is transferred separatelywith luminance data or chrominance data of the plurality of macroblocksincluded in a slice, which includes the steps of interpreting theluminance data of all the macroblocks included in the slice,interpreting the chrominance data of all the macroblocks included in theslice.

In still another aspect of the present invention, there is provided amethod of decoding a video sequence that includes a base layer and anFGS enhancement layer, which includes the steps of, interpreting data ofthe base layer, interpreting the luminance data of all the macroblocksincluded in the FGS enhancement layer, interpreting the chrominance dataof all the macroblocks, combining the luminance data and the chrominancedata of the FGS enhancement layer with the data of the base layer; anddecoding the combined data.

BRIEF DESCRIPTION OF THE DRAWINGS

The above and other aspects, features and advantages of the presentinvention will become more apparent from the following detaileddescription taken in conjunction with the accompanying drawings, inwhich:

FIG. 1 is an exemplary view illustrating the concept of a color spacescalable video coding according to an exemplary embodiment of thepresent invention;

FIG. 2 is a view illustrating the structure of a color space scalablebit stream according to an exemplary embodiment of the presentinvention;

FIG. 3 is a view illustrating the structure of a color space scalablebit stream in a multilayer structure according to an exemplaryembodiment of the present invention;

FIG. 4 is a view illustrating the structure of a color space scalablebit stream in an FGS layer structure according to an exemplaryembodiment of the present invention;

FIG. 5 is a view illustrating the structure of a color space scalablebit stream in an FGS layer structure according to another exemplaryembodiment of the present invention;

FIG. 6 is a block diagram illustrating the construction of a color spacescalable video encoder according to an exemplary embodiment of thepresent invention;

FIG. 7 is a block diagram illustrating the construction of a color spacescalable video decoder according to an exemplary embodiment of thepresent invention;

FIG. 8 is a block diagram illustrating the construction of a color spacescalable video encoder in an FGS layer structure according to anexemplary embodiment of the present invention;

FIG. 9 is a block diagram illustrating the construction of a color spacescalable video decoder in an FGS layer structure according to anexemplary embodiment of the present invention;

FIG. 10 is a flowchart illustrating an encoding process according to anexemplary embodiment of the present invention;

FIG. 11 is a flowchart illustrating a color component scalable videodecoding process according to an exemplary embodiment of the presentinvention; and

FIG. 12 is a flowchart illustrating a color depth scalable videodecoding process according to an exemplary embodiment of the presentinvention.

DETAILED DESCRIPTION OF THE EXEMPLARY EMBODIMENTS

Hereinafter, exemplary embodiments of the present invention will bedescribed in detail with reference to the accompanying drawings. Theaspects and features of the present invention and methods for achievingthe aspects and features will become apparent by referring to theexemplary embodiments to be described in detail with reference to theaccompanying drawings. However, the present invention is not limited tothe exemplary embodiments disclosed hereinafter, but can be implementedin diverse forms. The matters defined in the description, such as thedetailed construction and elements, are nothing but specific detailsprovided to assist those of ordinary skill in the art in a comprehensiveunderstanding of the invention, and the present invention is onlydefined within the scope of the appended claims. In the wholedescription of the present invention, the same drawing referencenumerals are used for the same elements across various figures.

FIG. 1 is an exemplary view illustrating the concept of a color spacescalable video coding according to an exemplary embodiment of thepresent invention.

A security system 20 not only displays a video in real time but alsostores video data for further use. The storage of video data requires alarge disk space, and if the video data is stored as a grayscale imagerather than a color image, the disk space for storing the image can bereduced. Accordingly, it is necessary to provide a function fortransforming a color image into a grayscale image according to therequirements of a display device. This function is hereinafter referredto as “color component scalability”.

On the other hand, one bit stream transmitted from an encoder side maybe scaled in color depth and be transmitted to various clients.Generally, a mobile phone 30 and a PDA 40 provide color depths lowerthan those of a notebook computer 50 and a PC 60. The function forscaling the same bit stream for display devices that support diversecolor depths is called “color depth scalability”.

Hereinafter, both the color component scalability and the color depthscalability described above are called color space scalability. Thiscolor space scalability can be implemented by pre-decoders or extractors11 to 14.

Since most digital video applications display color video, a mechanismthat captures and represents color information is required. Ablack/white image requires only one numeral to represent the luminanceof a respective spatial sample. By contrast, a color image requires atleast three numerals for each pixel in order to accurately represent thecolor. A color space is selected to represent the luminance and thecolor. The color space may be classified into an RGB color space, a YUVcolor space and a YCrCb color space. In the RGB color space, a colorimage sample is represented by three numerals that indicate relativeratios of red, green and blue. Since the three colors are equallyimportant in the RGB color space, they are stored with the sameresolution. However, the human visual system is more sensitive toluminance than chrominance, and thus the color image can be representedmore efficiently by separating the luminance information from the colorinformation, and representing the luminance data with a higherresolution than that of the chrominance data.

The YCrCb color space and the YUV color space, which is a modificationof the YCrCb color space, are popular methods for effectivelyrepresenting a color image by considering the human visual system, asdescribed above. Y denotes a luminance component, and can be calculatedas a weighted average of R, G and B as in Equation (1),Y=k _(r) R+k _(g) G+k _(b) B   (1)where, k is a weighting factor.

The color information can be expressed by chrominance components, andthe respective chrominance component can be expressed by the differencebetween R (or G or B) and Y as in Equation (2).Cb=B−YCr=R−YCg=G−Y   (2)

Accordingly, by separating the luminance component and the chrominancecomponents from each other, and encoding the separated components, acolor bit stream can be transformed into a grayscale bit stream.

FIG. 2 is a view illustrating the structure of a color componentscalable bit stream in which a luminance component and chrominancecomponents are separated from each other and encoded according to anexemplary embodiment of the present invention.

Referring to FIG. 2, luminance data 230 is first encoded and thenchrominance data 240 is encoded. In a bit stream, position informationof the luminance data 230 is inserted into a header 210. The positioninformation of the luminance data may include the entire length oftexture data and the length of the luminance data. In addition, theposition information of the luminance data may include any types ofinformation which indicate the position of the luminance data that canbe truncated by a decoder. The position information of the luminancedata can be inserted into a GOP (Group Of Pictures) header, a pictureheader, a slice header, or any proper position in the bit stream.

The decoder can restore grayscale video frames by extracting only aluminance component from a color bit stream. However, in an exemplaryembodiment of the present invention, through a very simpleimplementation of the decoder, the color component scalability can beachieved, but the color depth scalability cannot be.

The color depth scalability refers to a function required by a decoderof a display device that cannot display the image with resolution of 24bits, e.g., a PDA or a mobile phone. Accordingly, the decoder mustprovide a bit stream that uses the color depth suitable to therespective display device. Resources, such as bandwidth and decodingtime, are wasted when the display device, such as the PDA or mobilephone, processes a bit stream having a complete color depth.

The color depth scalability can be achieved by making the decoderacquire information from the client on the color depth capacitysupported by a client, and then removing bits that exceed the colordepth capacity supported by the client from the bit stream. Theimplementation of the color depth scalability will be explained withreference to FIGS. 3 and 4.

FIG. 3 is a view illustrating the structure of a color space scalablebit stream for providing color component scalability and color depthscalability using a multilayer structure according to an exemplaryembodiment of the present invention.

In FIG. 3, the color component scalability and the color depthscalability are provided in a multilayer structure. In an exemplaryembodiment of the present invention, the bit stream includes a pluralityof layers that include texture data having different color depths.Specifically, the first layer may include texture data that supports a12 bit color depth, the second layer may include texture data thatsupports a 16 bit color depth, and the third layer may include 24 bitsof texture data. If an encoder encodes and transmits the texture datahaving different color depths using the multilayer structure, a decoderacquires information on the color depth scalability supported by thedisplay device, removes the bit stream of the layer that exceeds thesupportable color depth capacity from the received bit streams, and thendecodes the remaining bit streams to restore the video frames. Forexample, if the display device is a PDA that supports a 16 bit colordepth, the decoder removes the bit stream corresponding to the thirdlayer, and decodes the bit streams corresponding to the first layer andthe second layer in order to display the decoded bit streams.

On the other hand, the bit stream corresponding to the respective layerincludes position information 310 and 340 of luminance data fordiscriminating luminance data 320 and 350 from chrominance data 330 and360 in the respective layer, and thus the color component scalabilitycan be realized in the respective layer. Accordingly, in the case wherethe display device supports a 16 bit color depth and grayscale, thedecoder can restore the video frames by truncating the bit streamcorresponding to the third layer and truncating the unnecessarychrominance data 330 and 360 according to the position information 310and 340 of the luminance data of the first and second layers. Theposition information of the luminance data can be inserted into a GOPheader, a picture header, a slice header, or any proper position in thebit stream. The position information of the luminance data may includethe entire length of the texture data and the length of the luminancedata. In addition, the position information of the luminance data mayinclude any types of information which can indicate the position of theluminance data that can be truncated by the decoder.

In another exemplary embodiment of the present invention, the structureof the bit stream that can support both the color component scalabilityand the color depth scalability is exemplified. However, if only thecolor depth scalability using the multilayer structure is to besupported, the position information 310 and 340 of the luminance data ofthe respective layer can be omitted.

FIG. 4 is a view illustrating the structure of a color space scalablebit stream in an FGS (Fine Grain SNR scalability) layer structureaccording to an exemplary embodiment of the present invention. In FIG.4, the color component scalability and the color depth scalability areprovided in a FGS layer structure. The FGS technique is for implementingSNR scalability, which is for decoding an input video into two layershaving the same frame rate and resolution and different accuracies ofquantization. In particular, the FGS technique encodes the input videointo two layers, i.e., a base layer and an enhanced layer, and encodes aresidual signal of the enhanced layer. The FGS technique may or may nottransmit the encoded signals so as to prevent the encoded signals frombeing decoded by a decoder according to the network transmissionefficiency or the state of the decoder side. Accordingly, the data canbe properly transmitted with its amount adjusted according to thetransmission bit rate of the network.

FGS of the SVM (Scalable Video Model) 3.0 is implemented using a gradualrefinement representation. The SNR scalability in FGS can be achieved bymaking it possible to truncate network abstraction layer (NAL) unitsgenerated as the result of FGS encoding in any place. FGS is composed ofa base layer and an FGS enhanced layer. The base layer generates baselayer frames that represent a minimum quality of the video that can betransmitted at the lowest transmission bit rate, and the FGS enhancedlayer generates the NAL units that may be properly truncated andtransmitted at a bit rate higher than the lowest transmission bit rate,or which may be properly truncated and decoded by the decoder. The FGSenhanced layer transforms and quantizes a residual signal obtained bysubtracting the restored frames, which have been obtained in the baselayer or the lower enhanced layer, from the original frames to transmitthe quantized residual signal to the decoder. As the layer becomes anupper layer, the SNR scalability can be realized by generating a moreexquisite residual by reducing quantization parameter values.

In this exemplary embodiment of the present invention, the color depthscalability is realized using three FGS layers, i.e., a first FGS layer(base layer), a second FGS layer (a first FGS enhanced layer), and athird FGS layer (a second FGS enhanced layer).

Generally, in the FGS layer structure of SVM 3.0, if the number oflayers is increased by one, the number of bits that can be used for thetexture data is also increased by one. By using this to increase thecolor depth capacity, the second FGS layer can support the color depththat is one bit larger than that of the first FGS layer, and the thirdFGS layer can support the color depth that is one bit larger than thatof the second FGS layer. If the encoder encodes and transmits thetexture data having different color depths using the FGS layerstructure, the decoder acquires information on the color depthscalability supported by the display device, removes the bit stream ofthe layer that exceeds the supportable color depth capacity from thereceived bit streams, and then decodes the remaining bit streams torestore the video frames.

In this exemplary embodiment of the present invention, the bit streammay include position information 410 of the luminance data fordiscriminating the luminance data 420, 440 and 460 in all the FGS layersfrom chrominance data 430, 450 and 470 in order to support the colorcomponent scalability. Accordingly, if the display device supports thecolor depth corresponding to the second FGS layer and grayscale, thedecoder can restore the video frames by truncating the bit streamcorresponding to the third layer and truncating the unnecessarychrominance data 430 and 450 according to the position information 410of the luminance data of the first and second layers. The positioninformation of the luminance data can be inserted into a GOP header, apicture header, a slice header, or any proper position in the bitstream. In the exemplary embodiment of the present invention, theposition information of the luminance data may include the entire lengthof the texture data and the length of the luminance data. In addition,the position information of the luminance data may include any type ofinformation that can indicate the position of the luminance data thatcan be truncated by the decoder.

In this exemplary embodiment of the present invention, the structure ofthe bit stream that can support both the color component scalability andthe color depth scalability is exemplified. However, if only the colordepth scalability using FGS is to be supported, the position information410 of the luminance data of the respective FGS layer can be omitted.

FIG. 5 is a view illustrating the structure of a color space scalablebit stream in an FGS layer structure according to another exemplaryembodiment of the present invention.

In this exemplary embodiment as illustrated in FIG. 5, the bit streamhas a structure for color space scalable coding and decoding using FGS,in the same manner as the structure of FIG. 4. However, the exemplaryembodiment as illustrated in FIG. 4 provides a structure that includestexture data composed of luminance data and chrominance data in theorder of FGS layers, while the exemplary embodiment as illustrated inFIG. 5 provides a structure in which luminance data 520 to 540 of allFGS layers are separated from the chrominance data 550 to 570 of all FGSlayers, and arranged at the head of the bit stream. In this exemplaryembodiment of the present invention, the color depth scalability and thecolor component scalability are realized using three FGS layers, i.e., afirst FGS layer (base layer), a second FGS layer (a first FGS enhancedlayer), and a third FGS layer (a second FGS enhanced layer). The decoderacquires information on the color depth scalability supported by thedisplay device, removes the bit stream of the layer that exceeds thesupportable color depth capacity from the received bit streams, and thendecodes the remaining bit streams to restore the video frames. Inparticular, in this exemplary embodiment of the present invention, thebit stream may include position information 510 of a boundary betweenthe luminance data 540 and the chrominance data 550 in order to supportthe color component scalability. In this case, since the maximumluminance data can be used by giving up the chrominance data, agrayscale image having a high sharpness can be restored.

FIG. 6 is a block diagram illustrating the construction of a color spacescalable video encoder according to an exemplary embodiment of thepresent invention.

Referring to FIG. 6, the color space scalable video encoder 600according to this exemplary embodiment of the present invention includesa temporal transform unit 610, a spatial transform unit 620, aquantization unit 630, an entropy coding unit 640, a color spacescalable bit stream generation unit 650, an inverse quantization unit660, and an inverse spatial transform unit 670. The temporal transformunit 610 may include a motion estimation unit 612, a motion compensationunit 614, and a subtracter 616.

The motion estimation unit 612 performs motion estimation on the presentframe based on the reference frame in input video frames, and obtainsmotion vectors. The algorithm that is widely used for the motionestimation is a block matching algorithm. This block matching algorithmestimates a displacement that corresponds to the minimum error as amotion vector as it moves a given motion block in the unit of a pixel ina specified search area of the reference frame. For the motionestimation, a motion block having a fixed size or a motion block havinga variable size according to a hierarchical variable size block matching(HVSBM) algorithm may be used. The motion estimation unit 612 providesmotion data such as motion vectors obtained as the results of motionestimation, the size of the motion block, and the reference frame numberto the entropy coding unit 640.

The motion compensation unit 614 reduces the temporal redundancy of theinput video frame. In this case, the motion compensation unit 614generates a temporally predicted frame for the present frame byperforming motion compensation for the reference frame using the motionvectors calculated in the motion estimation unit 612.

The subtracter 616 removes the temporal redundancy of the video bysubtracting the temporally predicted frame from the present frame.

The spatial transform unit 620 removes spatial redundancy from theframe, from which the temporal redundancy has been removed by thesubtracter 616, using a spatial transform method that supports thespatial scalability. The discrete cosine transform (DCT), the wavelettransform, and others, may be used as the spatial transform method.Coefficients obtained as the result of the spatial transform are calledtransform coefficients. If the DCT is used as the spatial transformmethod, the resultant coefficients are called DCT coefficients, while ifthe wavelet transform is used, the resultant coefficients are calledwavelet coefficients.

The quantization unit 630 quantizes the transform coefficients obtainedby the spatial transform unit 620. Quantization means representing thetransform coefficients, which are expressed as real values, by discretevalues by dividing the transform coefficients into specified sectionsand then matching the discrete values to specified indexes. Inparticular, in the case of using the wavelet transform as the spatialtransform method, an embedded quantization method is mainly used as thequantization method. This embedded quantization method performs anefficient quantization using the spatial redundancy by preferentiallycoding components of the transform coefficients that exceed a thresholdvalue by changing the threshold value (to ½). The embedded quantizationmethod may be an embedded zerotrees wavelet algorithm (EZW), a setpartitioning in hierarchical trees algorithm (SPIHT), or an embeddedzeroblock coding algorithm (EZBC).

The entropy coding unit 640 performs a lossless coding of the transformcoefficients quantized by the quantization unit 630 and motion dataprovided by the motion estimation unit 612, and generates an output bitstream. Arithmetic coding or variable length coding may be used as thelossless coding method.

The color space scalable bit stream generation unit 650 inserts positioninformation of luminance data in texture data provided by thequantization unit 630 into the bit stream provided by the entropy codingunit 640 in a proper form. The form of the bit stream generated by thecolor space scalable bit stream generation unit 650 is as describedabove with reference to FIG. 2.

In another exemplary embodiment of the present invention, the colorspace scalable bit stream generation unit 650 may first insert theposition information of the luminance data into the header of thetexture data quantized by the quantization unit 630, not into the headerof the entire bit stream, to provide the texture data to the entropycoding unit 640. In this case, a decoder side 700 can extract theposition information of the luminance data from the header of thetexture data after decoding the bit stream.

In the case where the video encoder 600 supports a closed-loop videoencoder in order to reduce a drifting error occurring between theencoder side and the decoder side, it may further include an inversequantization unit 660 and an inverse spatial transform unit 670.

The inverse quantization unit 660 performs inverse quantization on thecoefficients quantized by the quantization unit 630. This inversequantization process corresponds to the inverse process of thequantization process.

The inverse spatial transform unit 670 performs an inverse spatialtransform on the results of the inverse quantization, and provides theresults of the inverse spatial transform to an adder 680.

The adder 680 restores the video frame by adding the residual frameprovided by the inverse spatial transform unit 670 to the previous frameprovided by the motion compensation unit 614 and stored in a framebuffer (not illustrated), and provides the restored video frame to themotion estimation unit 612 as the reference frame.

With reference to FIG. 6, a single layer video encoder has beenexplained. However, it will be apparent to those skilled in the art thatthe video encoder according to the present invention can be extended toa color space scalable video coding using a multilayer structure asillustrated in FIG. 3.

FIG. 7 is a block diagram illustrating the construction of a color spacescalable video decoder according to an exemplary embodiment of thepresent invention.

Referring to FIG. 7, the color space scalable video decoder 700according to this exemplary embodiment of the present invention includesa bit stream preprocessing unit 710, an entropy decoding unit 720, aninverse quantization unit 730, an inverse spatial transform unit 740,and a motion compensation unit 750.

The bit stream preprocessing unit 710 acquires information on asupportable color space from the display device, truncates the receivedbit stream according to the color space information, and provides thetruncated bit stream to the entropy decoding unit 720. The informationon the color space supported by the display device may be information onthe displayed color/grayscale image, the color depth capacity, andothers.

In the case where the display device supports only the grayscale imageas described above with reference to FIGS. 2 and 3, the bit streampreprocessing unit 710 extracts the position information of theluminance data from the bit stream, truncates a part corresponding tothe chrominance data from the texture data, and provides the bit streamthat includes only the motion data and the luminance data to the entropydecoding unit 720. Also, the bit stream preprocessing unit 710 maytruncate bits or a layer that exceed the color depth capacity supportedby the display device, and provide the remaining bit stream to theentropy decoding unit 720.

In another exemplary embodiment of the present invention, without thepreprocessing by the bit stream preprocessing unit 710, the entropydecoding unit 720 may extract the position information of the luminancedata included in the header part of the texture data and truncate thechrominance data, if needed, after decoding the received bit stream andextracting the texture data.

The entropy decoding unit 720 extracts motion data and texture data byperforming the lossless decoding that is to the inverse of the entropyencoding. The entropy decoding unit 720 provides the extracted texturedata to the inverse quantization unit 730, and provides the extractedmotion data to the motion compensation unit 750.

The inverse quantization unit 730 performs inverse quantization on thetexture data transmitted from the entropy decoding unit 720. Thisinverse quantization process is to search for quantized coefficientsthat match values expressed by specified indexes and transferred fromthe encoder side 600. A table that represents a mapping between indexesand quantization coefficients may be transferred from the encoder side600, or may be prepared in advance by an agreement between the encoderand the decoder.

The inverse spatial transform unit 740 inversely performs the spatialtransform and restores the coefficients generated as the results of theinverse quantization to the residual image in a spatial domain. Forexample, if the coefficients have been spatially transformed by awavelet transform method in the video encoder side, the inverse spatialtransform unit 740 will perform the inverse wavelet transform, while ifthe coefficients have been transformed by a DCT transform method in thevideo encoder side, the inverse spatial transform unit will perform theinverse DCT transform.

The motion compensation unit 750 performs motion compensation of therestored video frames and generates motion compensated frames using themotion data provided by the entropy decoding unit 720. Of course, thismotion compensation process can be performed only when the present frameis decoded through the temporal prediction process on the encoder side.

An adder 760 restores the video frames by adding the residual image tothe motion compensated frames provided by the motion compensation unit750 when the residual image restored by the inverse spatial transformunit is generated by a temporal prediction.

FIG. 8 is a block diagram illustrating the construction of a color spacescalable video encoder in an FGS layer structure according to anexemplary embodiment of the present invention.

Referring to FIG. 8, the encoder according to this exemplary embodimentof the present invention may briefly include a base layer encoder 810and an enhancement layer encoder 850. In this exemplary embodiment ofthe present invention, it is exemplified that a base layer and anenhancement layer are used. However, it will be apparent to thoseskilled in the art that the present invention can be also applied tocases where more layers are used.

The base layer encoder 810 may include a motion estimation unit 812, amotion compensation unit 814, a spatial transform unit 818, aquantization unit 820, an entropy coding unit 822, a color spacescalable bit stream generation unit 832, an inverse quantization unit824, an inverse spatial transform unit 826 and a deblocking unit 830.

The motion estimation unit 812 performs motion estimation of the presentframe based on the reference frame among input video frames, and obtainsmotion vectors. In this exemplary embodiment of the present invention,the motion vectors for prediction are obtained by receiving the restoredframe that has been deblocked by the deblocking unit 830. The motionestimation unit 812 provides motion data such as motion vectors obtainedas the result of motion estimation, the size of the motion block, andthe reference frame number to the entropy coding unit 822.

The motion compensation unit 814 generates a temporally predicted framefor the present frame by performing motion compensation for a forward orbackward reference frame using the motion vectors calculated by themotion estimation unit 812.

The subtracter 816 removes temporal redundancy of the video bysubtracting the temporally predicted frame provided by the motioncompensation unit 814 from the present input frame.

The quantization unit 820 quantizes the transform coefficients obtainedby the spatial transform unit 818.

The entropy coding unit 822 performs lossless coding of the transformcoefficients quantized by the quantization unit 820 and the motion dataprovided by the motion estimation unit 812, and generates an output bitstream.

The color space scalable bit stream generation unit 832 inserts theposition information of the luminance data among the texture dataprovided by the quantization unit 820 into the bit stream provided bythe entropy coding unit 822 in a proper form. The form of the bit streamgenerated by the color space scalable bit stream generation unit 832 isas described above with reference to FIGS. 4 and 5.

In another exemplary embodiment of the present invention, the colorspace scalable bit stream generation unit 832 may first insert theposition information of the luminance data into the header part of thetexture data quantized by the quantization unit 820, not into the headerpart of the entire bit stream, to provide the texture data to theentropy coding unit 822. In this case, a decoder side 900 can extractthe position information of the luminance data from the header of thetexture data after decoding the bit stream.

In the case where the video encoder 800 supports a closed-loop videoencoder in order to reduce a drifting error occurring between theencoder side and the decoder side, it may further include an inversequantization unit 824 and an inverse spatial transform unit 826.

The deblocking unit 830 receives the restored video frames from an adder828 and performs deblocking to remove artifacts caused due to boundariesamong blocks in the frame. The deblocked restored video frame isprovided to an enhancement layer encoder 850 as the reference frame.

The enhancement layer encoder 850 may include a spatial transform unit854, a quantization unit 856, an entropy coding unit 868, an inversequantization unit 858, and an inverse spatial transform unit 860 and adeblocking unit.

A color space scalable bit stream generation unit 870 inserts theposition information of the luminance data among the texture dataprovided by the quantization unit 856 into the bit stream provided bythe entropy coding unit 868 in a proper form. The form of the bit streamgenerated by the color space scalable bit stream generation unit 870 isas described above with reference to FIGS. 4 and 5.

In another exemplary embodiment of the present invention, the colorspace scalable bit stream generation unit 870 may first insert theposition information of the luminance data into the header part of thetexture data quantized by the quantization unit 856, not into the headerpart of the entire bit stream, to provide the texture data to theentropy coding unit 868. In this case, a decoder side 900 can extractthe position information of the luminance data from the header of thetexture data after decoding the bit stream.

A subtracter 852 generates a residual frame by subtracting the referenceframe provided by the base layer from the present input frame. Theresidual frame is encoded through the spatial transform unit 854 and thequantization unit 856, and is restored through the inverse quantizationunit 858 and the inverse spatial transform unit 860.

An adder 862 generates a restored frame by adding the restored residualframe provided by the inverse spatial transform unit 860 to thereference frame provided by the base layer. The restored frame isprovided to an upper enhanced layer as the reference frame.

Since the operations of the spatial transform unit 854, the quantizationunit 856, the entropy coding unit 868, the inverse quantization unit 858and the inverse spatial transform unit 860 are the same as those of thebase layer, an explanation thereof has been omitted.

Although it is exemplified that a plurality of constituent elementshaving the same names with different identification numbers exist inFIG. 8, it will be apparent to those skilled in the art that oneconstituent element can operate in both the base layer and theenhancement layer.

FIG. 9 is a block diagram illustrating the construction of a color spacescalable video decoder in an FGS layer structure according to anexemplary embodiment of the present invention.

Referring to FIG. 9, the video decoder 900 may include a base layerdecoder 910 and an enhancement layer decoder 950.

The enhancement layer decoder 950 may include a bit stream preprocessingunit 953, an entropy decoding unit 955, an inverse quantization unit 960and an inverse spatial transform unit 965.

The bit stream preprocessing unit 953 acquires information on asupportable color space from the display device, truncates the receivedbit stream according to the color space information, and provides thetruncated bit stream to the entropy decoding unit 955. The informationon the color space supported by the display device may be information onthe displayed color/grayscale image, the color depth capacity, andothers.

The entropy decoding unit 955 extracts the texture data by performingthe lossless decoding that is to the inverse of the entropy encoding.The texture information is provided to the inverse quantization unit960.

The inverse quantization unit 960 performs inverse quantization on thetexture data transmitted from the entropy encoding unit 955. The inversequantization process is to search for the quantized coefficients thatmatch values expressed by specified indexes and transferred from theencoder side 800.

The inverse spatial transform unit 965 inversely performs the spatialtransform and restores the coefficients created as the results of theinverse quantization to the residual image in a spatial domain.

An adder 970 restores the video frames by adding the residual imagerestored by the inverse spatial transform unit to the reference frameprovided by the deblocking unit 940 of the base layer decoder.

The base layer decoder 910 may include a bit stream preprocessing unit913, an entropy decoding unit 915, an inverse quantization unit 920, aninverse spatial transform unit 925, a motion compensation unit 930 and adeblocking unit 940.

The bit stream preprocessing unit 913 acquires information on asupportable color space from the display device, truncates the receivedbit stream according to the color space information, and provides thetruncated bit stream to the entropy decoding unit 915. The informationon the color space supported by the display device may be information onthe displayed color/grayscale image, the color depth capacity, andothers.

The entropy decoding unit 915 extracts the texture data and motion databy performing the lossless decoding that is reverse to the entropyencoding. The texture information is provided to the inversequantization unit 920.

The motion compensation unit 930 performs motion compensation of therestored video frame using the motion data provided by the entropydecoding unit 915 and generates a motion compensated frame. This motioncompensation process is applied only to the case where the present framehas been encoded through a temporal predication process in the encoderside.

An adder 935 restores the video frame by adding the residual image tothe motion compensated image provided by the motion compensation unit930 when the residual image restored by the inverse spatial transformunit 925 is generated by the temporal prediction.

The deblocking unit 940, which corresponds to the deblocking unit 830 ofthe base layer encoder as illustrated in FIG. 8, generates the baselayer frame by deblocking the restored video frame from the adder 935,and provides the base layer frame to the adder 970 of the enhancementlayer decoder 950 as the reference frame.

Since the operations of the inverse quantization unit 920 and theinverse spatial transform unit 925 are the same as those existing in theenhancement layer, the repeated explanation thereof will be omitted.

Although it is exemplified that a plurality of constituent elementshaving the same names with different identification numbers exist inFIG. 9, it will be apparent to those skilled in the art that oneconstituent element having a specified name can operate in both the baselayer and the enhancement layer.

The respective constituent elements as illustrated in FIGS. 6 to 9 aresoftware or hardware such as a field-programmable gate array (FPGA) anda application-specific integrated circuit (ASIC). However, theconstituent elements are not limited to the software or hardware. Theconstituent elements may be constructed so as to reside in anaddressable storage medium or to execute one or more processors. Thefunctions provided in the constituent elements may be implemented bysubdivided constituent elements, and the constituent elements andfunctions provided in the constituent elements may be combined togetherto perform a specified function. In addition, the constituent elementsmay be implemented so as to execute one or more computers in a system.

FIG. 10 is a flowchart illustrating an encoding process according to anexemplary embodiment of the present invention.

Referring to FIG. 10, the temporal transform unit 610 and the spatialtransform unit 620 of the video encoder 600 according to an exemplaryembodiment of the present invention remove temporal redundancy andspatial redundancy of input video frames S1010. In this case, thespatial redundancy may be removed after the temporal redundancy isremoved, or the temporal redundancy may be removed after the spatialredundancy is removed. The quantization unit 630 quantizes transformcoefficients generated as the results of removing the temporalredundancy and the spatial redundancy S1020. The entropy coding unit 640generates a bit stream by encoding the quantized transform coefficientsS1030. The color space scalable bit stream generation unit generates acolor space scalable bit stream by adding the position information ofthe luminance data to the entropy-coded bit stream S1040.

FIG. 11 is a flowchart illustrating a color component scalable videodecoding process according to an exemplary embodiment of the presentinvention.

Referring to FIG. 11, the bit stream preprocessing unit 710 of the videodecoder 700 according to an exemplary embodiment of the presentinvention extracts the position information of the luminance data fromthe received bit stream S1110. The bit stream preprocessing unit 710truncates the chrominance data from the bit stream according to theposition information of the luminance data S1120. The entropy decodingunit 720 decodes the preprocessed bit stream S1130, and the inversequantization unit 730 performs inverse quantization on the decoded bitstream S1140. Then, the inverse spatial transform unit 740 restores thevideo frame by performing an inverse spatial transform on the inverselyquantized bit stream S1150.

FIG. 12 is a flowchart illustrating a color depth scalable videodecoding process according to an exemplary embodiment of the presentinvention.

Referring to FIG. 12, the bit stream preprocessing unit 710 of the videodecoder 700 according to an exemplary embodiment of the presentinvention acquires information on a color depth capacity from a displaydevice S1210, and generates a second bit stream by truncating bits thatexceed the color depth capacity from an input bit stream according tothe acquired information on the color depth capacity S1220. The entropydecoding unit 720 decodes the second bit stream S1230, and the inversequantization unit 730 performs inverse quantization on the decoded bitstream S1240. Then, the video is restored when the inverse spatialtransform unit 740 restores a residual signal by performing an inversespatial transform on the transform coefficients and the motioncompensation unit performs a motion compensation of predicted framesaccording to motion data provided by the entropy decoding unit.

As described above, the color space scalable video coding and decodingmethod according to the present invention produces at least one of thefollowing effects.

First, an encoder can inform a decoder of the position of luminance datain a bit stream, and thus the decoder can transform a color image into agrayscale image, as needed.

Second, the color depth scalability can be achieved in a simple mannerby the decoder acquiring information on a color depth capacity from adisplay device, removing bits that exceed the color depth capacitysupported by the display device, and decoding the bit stream.

The exemplary embodiments of the present invention have been describedfor illustrative purposes, and those skilled in the art will appreciatethat various modifications, additions and substitutions are possiblewithout departing from the scope and spirit of the invention asdisclosed in the accompanying claims.

1. A method of transferring data of a slice that contains a plurality ofmacroblocks, comprising: (a) sequentially inserting luminance data ofall the macroblocks contained in the slice into a bitstream; (b)sequentially inserting chrominance data of all the macroblocks containedin the slice into the bitstream after the inserted luminance data; (c)transferring the bitstream that comprises the inserted luminance dataand the inserted chrominance data.
 2. The method as claimed in claim 1,wherein the slice is included in a fine granule SNR scalability (FGS)layer.
 3. The method as claimed in claim 1, further comprising insertingposition information of the luminance data and the chrominance data intothe bitstream.
 4. A method of generating a video sequence comprising aplurality of slices comprising a plurality of macroblocks with luminancedata and chrominance data of the macroblocks, the method comprising: (a)inserting the luminance data of all the macroblocks included in a sliceinto the video sequence; and (b) inserting the chrominance data of allthe macroblocks included in the slice into the video sequence after theinserted luminance data.
 5. The method as claimed in claim 4, whereinthe video sequence is a fine granule SNR scalability (FGS) layer.
 6. Themethod as claimed in claim 4, wherein the video sequence comprisesposition information of the luminance data and position information ofthe chrominance data.
 7. The method of claim 4, further comprising (c)performing inverse discrete cosine transform on the interpretedluminance and chrominance data.
 8. The method of claim 7, furthercomprising (d) performing inverse temporal filtering on the interpretedluminance and chrominance data which has been subjected to the inversediscrete cosine transform.